skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Lostanlen, V"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. With the aim of constructing a biologically plausible model of machine listening, we study the representation of a multicomponent stationary signal by a wavelet scattering network. First, we show that renormalizing second-order nodes by their first-order parents gives a simple numerical criterion to establish whether two neighboring components will interfere psychoacoustically. Secondly, we generalize the “one or two components” framework to three sine waves or more, and show that a network of depth M = log2 N suffices to characterize the relative amplitudes of the first N terms in a Fourier series, while enjoying properties of invariance to frequency transposition and component-wise phase shifts. 
    more » « less
  2. This article explains how to apply time–frequency scattering, a con- volutional operator extracting modulations in the time–frequency domain at different rates and scales, to the re-synthesis and manip- ulation of audio textures. After implementing phase retrieval in the scattering network by gradient backpropagation, we introduce scale-rate DAFx, a class of audio transformations expressed in the domain of time–frequency scattering coefficients. One example of scale-rate DAFx is chirp rate inversion, which causes each sonic event to be locally reversed in time while leaving the arrow of time globally unchanged. Over the past two years, our work has led to the creation of four electroacoustic pieces: FAVN; Modulator (Scat- tering Transform); Experimental Palimpsest; Inspection (Maida Vale Project) and Inspection II; as well as XAllegroX (Hecker Scat- tering.m Sequence), a remix of Lorenzo Senni’s XAllegroX, released by Warp Records on a vinyl entitled The Shape of RemiXXXes to Come. 
    more » « less
  3. Vibratos, tremolos, trills, and flutter-tongue are techniques frequently found in vocal and instrumental music. A com- mon feature of these techniques is the periodic modulation in the time–frequency domain. We propose a representa- tion based on time–frequency scattering to model the inter- class variability for fine discrimination of these periodic modulations. Time–frequency scattering is an instance of the scattering transform, an approach for building invari- ant, stable, and informative signal representations. The proposed representation is calculated around the wavelet subband of maximal acoustic energy, rather than over all the wavelet bands. To demonstrate the feasibility of this approach, we build a system that computes the represen- tation as input to a machine learning classifier. Whereas previously published datasets for playing technique analy- sis focus primarily on techniques recorded in isolation, for ecological validity, we create a new dataset to evaluate the system. The dataset, named CBF-periDB, contains full- length expert performances on the Chinese bamboo flute that have been thoroughly annotated by the players them- selves. We report F-measures of 99% for flutter-tongue, 82% for trill, 69% for vibrato, and 51% for tremolo detec- tion, and provide explanatory visualisations of scattering coefficients for each of these techniques. 
    more » « less
  4. We explore computational strategies for matching human vocal imitations of birdsong to actual birdsong recordings. We recorded human vocal imitations of birdsong and subsequently analysed these data using three categories of audio features for matching imitations to original birdsong: spectral, temporal, and spectrotemporal. These exploratory analyses suggest that spectral features can help distinguish imitation strategies (e.g. whistling vs. singing) but are insufficient for distinguishing species. Similarly, whereas temporal features are correlated between human imitations and natural birdsong, they are also insufficient. Spectrotemporal features showed the greatest promise, in particular when used to extract a representation of the pitch contour of birdsong and human imitations. This finding suggests a link between the task of matching human imitations to birdsong to retrieval tasks in the music domain such as query-by-humming and cover song retrieval; we borrow from such existing methodologies to outline directions for future research. 
    more » « less
  5. This paper proposes to perform unsupervised detection of bioacous- tic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalizes logarithm-based spectral flux, yet with a tunable time scale for background noise estimation. In comparison with point- wise logarithm, PCEN reduces false alarm rate by 50x in the near field and 5x in the far field, both on avian and marine bioacoustic datasets. Such improvements come at moderate computational cost and require no human intervention, thus heralding a promising future for PCEN in bioacoustics. 
    more » « less
  6. SONYC Urban Sound Tagging (SONYC-UST) is a dataset for the development and evaluation of machine listening systems for realworld urban noise monitoring. It consists of 3068 audio recordings from the “Sounds of New York City” (SONYC) acoustic sensor network. Via the Zooniverse citizen science platform, volunteers tagged the presence of 23 fine-grained classes that were chosen in consultation with the New York City Department of Environmental Protection. These 23 fine-grained classes can be grouped into eight coarse-grained classes. In this work, we describe the collection of this dataset, metrics used to evaluate tagging systems, and the results of a simple baseline model. 
    more » « less